The Glass-like Structure of Globular Proteins and the Boson Peak 
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Vibrational spectra of proteins and topologically disordered solids display a common anomaly 
at low frequencies, known as Boson peak. We show that such feature in globular proteins can be 
deciphered in terms of an energy landscape picture, as it is for glassy systems. Exploiting the tools 
of Euclidean random matrix theory, we clarify the physical origin of such anomaly in terms of a 
mechanical instability of the system. As a natural explanation, we argue that such instability is 
relevant for proteins in order for their molecular functions to be optimally rooted in their structures. 
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Proteins are characterized by mechanically stable, 
unique native structures that bear a precise relation with 
their biological functions. Yet, in most cases, specific 
functionality is accompanied by large-amplitude dynam- 
ical conformational changes that require high flexibil- 
ity [l| . Protein structures are complex, hierarchical ones, 
characterized by short-range order and overall spatial 
correlations that bear strong similarities with those of 
glassy materials Q. In actual fact, proteins and glasses 
share many physical properties, such as peculiar relax- 
ation processes 0j and the occurrence of a dynamical 
transition as revealed by the temperature dependence of 
the atomic mean square displacements (MSD) QHOi 

Interestingly, there exists a remarkable similarity of the 
Raman and neutron-scattering spectra ofproteins with 
those of glasses and super-cooled liquids i.e. a peak 
that develops at low temperatures in the low-frequency 
regions. Such anomaly, known as Boson peak (BP), also 
shows up in the experimentally determined density of 
states when divided by the Dcbyc law, i.e. g(ui)/u> 2 
Several models have been proposed for the explanation 
of the BP in proteins, among which the phonon-fracton 
model , and the log- normal distribution model || . 

The BP is, on the other hand, a universal feature of 
many glassy systems 0|. In this context, several possi- 
ble explanations have been proposed, from the two-level 
system scenario |ldj to localized modes arising from a 
strong scattering of the phonons by the disorder fTl| . 
from "glassy" van Hove singularities ^3] to a mechanical 
instability [13j. Recently, the possibility that a BP may 
be a general feature of weakly connected systems has also 
been investigated 0, 0] . 

In a different analytical framework [l^, the excess of 
low-energy modes with respect to the Debye behaviour is 
viewed as a symptomatic effect of the topological phase 
transition which is conjectured to happen in glasses at 
low temperatures |l3j . Recently, a quantitative descrip- 
tion of the BP phenomenology has been given whithin 
the formalism of the Euclidean Random Matrix (ERM) 
theory whose predictions have been confirmed by 



numerical simulations on realistic glass-forming systems, 
emphasizing its universal character [l7j . 

In this Letter, we show that the emergence of a BP in 
globular proteins is the signature of a structural insta- 
bility of the saddle-phonon kind akin to that predicted 
within the ERM theory of glasses. Remarkably, our 
explanation allows for a natural interpretation of such 
instability in proteins in terms of the mutual relations 
among their structure, dynamics and biological function. 

To investigate the vibrational properties of a given 
globular protein, we coarse-grain its structure at the 
amino-acid level and build the associated elastic network 
(EN). The ap plic ation of EN models to proteins is rela- 
tively recent yjj, since it has commonly been assumed 
that little structural detail could be given up in order to 
model their complex energy landscapes. However, there 
is now strong evidence that most features of the large- 
and medium-scale dynamics of proteins' fluctuations 
around their native states, related to function and stabil- 
ity, can be successfully reproduced b y si mple harmonic 
interactions between amino-acids |l9l |2(X l2ll \22L 1 12 a | . In 
view of the BP phenomenology, it is important to men- 
tion the growing consensus that an explanation in glas ses 
could be found within a purely harmonic context |2 ll] . 

In the framework of EN models, the potential energy 
is written as a sum of pair-wise harmonic potentials, 
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being the position of the i— th par- 



ticle, 7y its equilibrium position and kij the stiffness of 
the spring connecting particles i and j. More precisely, 
the vector r*j represents the instantaneous position of the 
a-carbon of the i-th amino-acid, its position in the 
native state as determined from X-ray crystallography 
or Nuclear Magnetic Resonance, and fey can take differ- 
ent functional forms, such as kij = k6{t c — |r*/ 0) 
(sharp cutoff mod el 12(1 1 or kij = k exp(— \f^° 
(Gaussian model |2lJ), which is the one we adopt here 
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FIG. 1: Plot of the pair correlation function for Serum Al- 
bumin (N = 578, solid line) and for a collection of an equal 
number of residues uniformly distributed in its equivalent el- 
lipsoid (dashed line). Right inset: a magnification of the tails 
in lin-log scale. Left inset: Average connectivity vs cutoff 
distance (symbols) and cubic fit (solid line). 

The parameter k sets the physical units for force con- 
stants, and can be fixed by requiring the theoretical 
MSDs to match the experimental ones as determined 
from X-ray spectra [l9j . 

In the harmonic approximation, the total potential en- 
ergy J[J is the quadratic form V({r}) = ^f T Kf, where 
the contact matrix Kj a ,.j/3 (a, (3 = 1, 2, 3) is the Hessian of 
the function JQ) evaluated at the equilibrium structure. 
Were the position vectors in the native structure r/ ' ar- 
ranged at random, IK would exactly fall in the class of 
Euclidean Random Matrices. Even if protein structures 
are surely not random, an analysis of the pair correlation 
function g(r) reveals interesting features. In Fig. ^ we 
plot g(r) for Serum Albumin, a relatively large globular 
protein whose equivalent ellipsoid [2^] has principal radii 
measuring 2.3, 3.7 and 4 nm, and for an identical number 
of residues uniformly distributed within such ellipsoid. 
The comparison shows that the protein structure is char- 
acterized by two well-defined coordination shells, namely 
the nearest neighbors at fixed distance along the chain 
and the next-nearest off-chain neighbors, including the 
pairs belonging to alpha helices and those lying at turn- 
ing regions, such as loops. After a third, less resolved 
shell all pair-wise spatial correlations are lost. We re- 
peated this analysis for several proteins and always found 
that the second and the third peaks are always related 
to the presence of secondary motifs as well as to the in- 
trinsic flexibility of the peptide chain, while beyond such 
range spatial correlations are absent. This fact is a clear 
indication that, as far as large-scale structural properties 
are involved, proteins are well approximated by random 
assemblies of amino-acids with specified density. 

The analogy between protein structures and disordered 
systems with no long-range order suggests that a common 



mechanism might be responsible for the emergence of the 
BP in both cases. In topologically disordered solids, this 
anomaly appears upon increasing the temperature or, as 
observed for example in Silica, upon lowering the density. 
In the present case, we are dealing with proteins, i.e. ob- 
jects whose equilibrium structure is fixed by the biolog- 
ical function. However, changes in the particle density 
may still be simulated by resorting to the free parameter 
r c . In the framework of EN models, r c sets the range 
of inter-particle interactions and should in principle be 
tuned by fitting the low-frequency portion of experimen- 
tal spectra at temperatures below the dynamical tran- 
sition, where the protein vibrates harmonically within a 
local minimum. The usual alternative is to compare with 
spectra as determined by all-atom force fields [il]]. By 
doing t his, one obtains p c 3 A in an all-atom represen- 
tation |2jJ, which coarse-grains to r c (A^) 1 / 3 /^ rs 8 
A when the average number of atoms per amino-acid 
{N a ) ~ 18 is introduced. Interestingly, by its very defini- 
tion, the parameter r c also allows to regulate an effective 
local density of the system by tuning the average con- 
nectivity (c) = 3^ J2i=i E a =i ^ia,ia- By decreasing the 
cutoff r c , the average number of neighbors per residue 
diminishes accordingly. Thus, a local measure of com- 
pactness may be introduced that is proportional to (c). 
It can be shown that varying r c induces a change in the 
connectivity that scales with the interaction volume r 3 up 
to finite-size 0(r c ) corrections (see left inset in Fig. 
This means that we can study the spectral features of 
a given protein structure with the additional degree of 
freedom of varying density by simply changing the inter- 
action cutoff r c , which thus plays in this context the role 
of a control parameter. 

The vibrational spectrum of a protein for a certain 
value of the parameter r c is obtained by diagonalizing 
the contact matrix. However, especially for small pro- 
teins, the finite number of residues makes it difficult to 
analyze the low-frequency features of the spectra. In or- 
der to circumvent this problem, we generated a number 
of different conformers for each of the analyzed structures 
such that all of them be by construction compatible with 
the atomic MSDs as specified by the native contact ma- 
trices. More precisely, if we write the coordinates of a 
given conformcr as p <0) = r m +Sr, then it is sufficient to 
take 5r = U c, where U is the matrix of eigenvectors of IK 
and the 3A" — 6 coefficients are drawn from as many 
one-dimensional Gaussian distributions with zero mean 
and standard deviations Uk = \/ —k B T/ Afc, \k = —w\ 
being the eigenvalues of the contact matrix K. This pro- 
cedure provides a simple means to construct an arbitrary 
number of conformations that are dynamically equivalent 
to the native one in the harmonic approximation. 

In Fig. we plot g(u) and g(u})/oj 2 for several values 
of the cutoff r c for two representative proteins of dif- 
ferent size. Similar results were obtained for a choice 
of other proteins. A shoulder manifestly appears in the 
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FIG. 2: Boson peak analysis for two globular proteins of dif- 
ferent size. Left panels: Serum Albumin (1A06), N = 578 
residues. Right panels: Ubiquitin (1UBI), N = 76 residues. 
The four upper panels show the density of states for differ- 
ent values of r c (for 1000 thermal replicas). In the four lower 
ones, we show the fits to the BP frequency and height with the 
mean field expressions @. The best-fit results are: r* — 5.7 
A (Serum Albumin) and r* c = 3.5 A (Ubiquitin). The physical 
units for frequencies were obtained with r c — 8 A. 

low-frequency region as r c is reduced (see upper panels 
in Fig |2J) , and eventually a divergence develops if r c is 
decreased below a critical value. The origin of such peak 
can be uncovered by tracking its position wbp and height 
/ibp as r c , i.e. our effective density, decreases. From the 
lower panels of Fig [21 one can clearly appreciate that the 
scaling followed by lobp and h^p is very well interpolated 
by the analytical functional forms predicted by the ERM 
theory in the mean- field approximation |lfjj . i.e. 

ccbp ~ (r c - r*) Q , h BP ~ (r c - r' c )- p (2) 

with a = 1 and (3 = 1/2. Therefore, our analysis strongly 



FIG. 3: Plot of the level spacing statistics of Ubiquitin for 
different values of the cutoff r c . The Wigner-Dyson (thick 
solid line) and Poisson (dashed line) statistics, which describes 
totally uncorrelated spectra, are also shown for comparison. 
Upper inset: Jo = (s 2 ) /2 is plotted versus r c . The dashed line 
represents the value expected for a fully extended spectrum. 
Lower inset: level spacing statistics for frequencies U) < 2.5 
meV. The solid line is a plot of the Wigner surmise. 



suggests that the BP in protein structures at low den- 
sities can be interpreted in terms of a topological in- 
stability utterly analogous to the one found in glasses 
and glass-forming liquids |l7| . More rigorously, as it 
is the case for the Gaussian model in glasses, the BP 
should be interpreted as a precursor of the transition 
within a model that by definition becomes meaningless 
at the critical point. This is precisely what happens 
in our case, at an interaction range below which pro- 
tein structures start unfolding. We also stress that the 
shift of o>bp towards zero frequency and the divergence 
of the BP height as the systems loose rigidity is a spec- 
tral feature equally unveiled within different theoretical 
approaches [H [H Q [13 . 

It is also instructive to study the localization properties 
of typical ensembles of spectra through the level-spacing 
statistics P(s) [2q| . As an example, we plot the results 
obtained for Ubiquitin in Fig. Overall, the distribu- 
tion is very well described by a Wigner law, which holds 
for fully extended spectra. As we decrease the cutoff, a 
small contribution from localized modes is observed, as 
the measure of Jo = (s 2 )/2 shows (upper inset of Fig.OJ). 
Otherwise, Jo should be close to 1 in the case of a lo- 
calized spectrum, which is never the case. A more re- 
fined analysis |30| performed on several proteins clearly 
shows that the only localized modes are due to the tail of 
the spectrum at large frequencies, much alike structural 
glasses 0, This conclusion, further confirmed by 

the level spacing statistics from the low-frequency por- 
tion of the spectra (lower inset of Fig- , rules out the 
presence of localized modes in the BP region. 
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Protein 


N 


V 


(a + p) 


K (A) 


Insulin 


51 


0.20 


0.53 


4.57 


Protein G 


56 


0.21 


0.70 


3.64 


Ubiquitin 


71 


0.20 


0.46 


3.53 


PDZ binding domain 


85 


0.21 


0.55 


4.03 


Lysozyme 


162 


0.17 


0.74 


4.27 


Adenylate Kinase 


214 


0.12 


0.64 


7.85 


LAO 


238 


0.16 


0.60 


5.44 


CYSB 


260 


0.17 


0.59 


4.70 


PBGD 


296 


0.16 


0.60 


3.70 


Thermolysin 


316 


0.18 


0.53 


4.55 


HSP70 ATP-binding domain 


382 


0.15 


0.66 


5.28 


Fab- fragment 


437 


0.13 


0.48 


5.70 


Serum Albumin 


578 


0.12 


0.70 


5.70 


Correlation with r* c 


0.45 


0.82 


0.17 


1 



TABLE I: Correlation of r"" c with structural parameters. 



our results agree with recent estimates of the spectral di- 
mension of globular proteins, whose non-Debyc behavior 
has been interpreted in terms of a vibrational instability 
of the Peierls-Landau type [2^| . 

Summarizing, in this Letter we have provided com- 
pelling evidence of the equivalence of the Boson peak 
phenomenon in globular proteins and glasses. Our anal- 
ysis suggests that a topological instability of the saddle- 
phonon type in proteins reflects the balance imprinted 
in their structures between being able of rapidly access- 
ing different minima in the native energy landscape while 
keeping a relative mechanical rigidity. 

We thank L. Casetti, O. Martin, M. Mczard and G. 
Parisi for interesting discussions. S. C. also thanks the 
EPFL for its hospitality. S. C. is supported by ECHP 
Program, contract HPRN-CT-2002-00319 (STIPCO). 



The origin of a precursory feature of a topological in- 
stability in proteins can be formally understood by re- 
calling that their structures are those of folded polymers. 
If the interaction cutoff r c is lowered below the first off- 
chain coordination shell, native conformations lose their 
folded nature and become more and more akin to liquids. 
In fact, we argue that the appearance of the BP precisely 
anticipates such inherent instability before the critical 
cutoff is reached. Accordingly, the best-fit values of r* 
for all the analyzed structures does never exceed the first 
off-chain coordination shell (see Fig.[5J). Keeping in mind 
that the optimal value of r c is around 8 A, i.e. above its 
critical value, our results suggest that protein structures 
express an inherent trade off between spatial properties 
of liquids, i.e. increased degree of mobility, and the ne- 
cessity of maintaining a certain structural stability. In- 
terestingly, from an extensive analysis on a selection of 13 
proteins, we find that r* is substantially anti-correlated 
with the packing fraction p = 4/3(A r /I/)(d /2) 3 , i.e. a 
measure of global compactness, whereas weak correlation 
is found with indicators of local stability, such as the con- 
tent of a helices and (3 sheets. Here N and V are the 
number of residues and the volume, while do — 3.83 A is 
the inter-residue distance along the main chain. More- 
over, we also find a positive correlation between r* and 
N, which may signal the larger mechanical stability of 
smaller proteins (see Table HJ. 

The above conclusions may be interpreted by regarding 
proteins as molecular machines bound to keep a specified 
geometry in order to perform their biological function, 
yet preserving a high degree of structural flexibility in or- 
der to efficiently explore different conformational states. 
In this sense, the mechanical instability underlying the 
emergence of a BP appears to be a universal signature 
of their engineered ability to easily travel between adja- 
cent local minima in their native states. We note that 
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